This report investigates how student lifestyle factors — including study habits, sleep patterns, and gender — relate to self-reported daily anxiety levels. As concerns about student mental health and academic pressure continue to grow, identifying behavioral correlates of anxiety is crucial for developing effective student support strategies within universities. This analysis seeks to explore meaningful patterns within student survey responses that may guide future academic and well-being interventions.
The dataset was collected via a voluntary and anonymous online survey distributed to students enrolled in the DATA2X02 unit at the University of Sydney. Although the sample might initially appear random due to the open participation format, it is in fact non-random, and several types of bias must be considered.
First, selection bias is likely present, as students who chose to participate may be more engaged, self-reflective, or motivated — characteristics that are not necessarily representative of the broader student population. As a result, variables such as self-reported anxiety, study hours, and approaches to assignment management may be systematically skewed.
Second, non-response bias must also be acknowledged. Students who chose not to complete the survey could differ in significant ways from respondents, especially in terms of sleep consistency, WAM, or assignment submission patterns, leading to distorted estimates of key outcomes.
Third, the dataset is subject to measurement bias, as many questions rely on self-reported data, which are vulnerable to social desirability effects and recall inaccuracies. This is particularly relevant for sensitive or subjective variables such as alcohol consumption, sleep hours, and daily anxiety levels.
To improve data quality in future iterations, certain survey items could be redesigned. For example, the question about daily anxiety (Q15) currently uses a simple 1–10 scale, but lacks defined anchors or examples, which may cause participants to interpret the scale inconsistently. Adding clearly labelled scale intervals (e.g., “1 = not at all anxious,” “10 = extremely anxious multiple times per day”) would reduce ambiguity.
Despite these limitations, the dataset remains suitable for exploratory statistical analysis. This report addresses three focused research questions using techniques from Modules 1 and 2, including parametric and non-parametric hypothesis testing, as well as resampling methods. Specifically, we examine how academic workload, gender, and sleep consistency are associated with daily self-reported anxiety.
The calculations utilized in report were performed in R using the Rmarkdown environment to ensure reproducibility. Data Cleaning was carried out using various tidyverse packages. Column names were cleaned and shortened for clarity. Variables relevant to research was retained, including self-reported anxiety frequency, study hours, gender, and usual bedtime. Numeric variables were coerced into appropriate formats. Gender values were standardised for consistency into two groups, female and male since theand bedtime were converted to numeric hour values for further analysis.
2 Results
2.1 Does the distribution of anxiety levels differ by gender? (group the anxiety into groups: low, mid, high)
The bar chart in Figure 2 suggests that distribution of anxiety levels(low,mid,high) may differ across gender groups. To formally assess this, we performed chi-squared test of homogeneity to test whether the proportions of anxiety groups are same across different gender categories. Note that conducting a test of homogeneity, we assume that the responses were sampled independently from the student population that identify with each gender.
Null hypothesis (\(H_0\)):
The proportions of anxiety levels(Low,Mid,High) are the same across gender groups(Female,Male, and Other).
Alternative hypothesis (\(H_1\)):
The proportions of anxiety levels differ for at least one gender group.
2. Assumptions
The observations are independent(each student will contribute to only one gender group)
Data are organized in contingency table and are categorical
3. Test Static
Test type: Chi-squared test of homogeneity
Test statistic (t): 2.9237
Degrees of freedom (df): 2
4. P-Value
Chi-squared approximation p-value: 0.2318
5. Decision
Since the p-value is 0.2318, which is greater than 0.05(significance level), we fail to reject the null hypothesis. There is insufficient evidence to suggest that there are difference in anxiety level across gender groups.
6. Conclusion
This result with Chi-squared test of homogeineity suggest that there are no statistically significant correlation between gender groups and daily anxiety levels in this sample. However, the presence of small counts in some cells(e.g: Other group) may limit the reliability of the test conducted, and larger sample may be needed to draw more firmer and rigor conclusions.
2.2 Do students who study more than 20 hours/week report lower anxiety levels than those who study less?
Code
library(ggplot2)library(ggpubr)library(patchwork)# A. Boxplot + jitter (study group vs anxiety)p1 <-ggplot(df_clean, aes(x = study_group, y = daily_anxiety_frequency, fill = study_group)) +geom_boxplot(outlier.shape =NA, alpha =0.5) +geom_jitter(width =0.2, alpha =0.3) +labs(x ="Study Group",y ="Daily Anxiety Frequency" ) +theme_minimal()# B. QQ Plot p2 <-ggqqplot(df_clean, x ="daily_anxiety_frequency", facet.by ="study_group") +theme_minimal()# Combine themp1 + p2 +plot_annotation(tag_levels ='A')
<Figure 3. Visualisation of anxiety levels and normality across study hour groups>
A. Boxplots with jitter comparing daily anxiety frequency between students who study more than 20 hours and those who study 20 hours or less.
B. QQ plots assessing the normality of anxiety scores within each group. The data approximately follows the diagonal line, supporting the assumptions of Welch’s t-test.
t.test(daily_anxiety_frequency ~ study_group, data = df_clean, var.equal =TRUE)
Two Sample t-test
data: daily_anxiety_frequency by study_group
t = 2.0753, df = 341, p-value = 0.03871
alternative hypothesis: true difference in means between group > 20 hrs and group ≤ 20 hrs is not equal to 0
95 percent confidence interval:
0.02879255 1.07387046
sample estimates:
mean in group > 20 hrs mean in group ≤ 20 hrs
5.745902 5.194570
1. Hypothesis
Null hypothesis (\(H_0\)):
There is no difference in anxiety levels between students who study more than 20 hours and those who study less.
Alternative hypothesis (\(H_1\)):
Students who study more than 20 hours per week report lower anxiety levels on average.
2. Assumptions
Observations are independent
Anxiety scores are approximately normally distributed within each study group
Variances are equal(boxplot)
3. Test Static
Test type: Two sample t-test
Test statistic (t): 2.0753
Degrees of freedom (df): 341
4. P-Value
P-Value =0.03871
5. Decision
Since the p-value(0.03871) is greater than 0.05, we accept the null hypothesis. This provides evidence that there are significant difference in anxiety_level between two study groups.
6. Conclusion
Students who study more than 20 hours per week report significantly lower daily anxiety levels than those who studies less. While this association is statistically significant, the direction of casuality remains unclear and may be influenced by other unmeasured factors in student life.
2.3 Do early sleepers report lower anxiety levels than late sleepers?
Code
df_clean <- df_clean %>%mutate(sleep_group =if_else(sleep_hour <2| sleep_hour >=22, "Early", "Late") )set.seed(2025)# Make a working copy of anxiety and group labelsanx <- df_clean$daily_anxiety_frequencygrp <- df_clean$sleep_group# Observed difference in mean anxietyobs_diff <-mean(anx[grp =="Early"]) -mean(anx[grp =="Late"])# Fast permutation testn_sim <-10000perm_diffs <-replicate(n_sim, { shuffled_grp <-sample(grp)mean(anx[shuffled_grp =="Early"]) -mean(anx[shuffled_grp =="Late"])})# Two-sided p-valuep_val <-mean(abs(perm_diffs) >=abs(obs_diff))p_val
[1] 0.458
Code
library(tidyverse)set.seed(2025)# Define variablesanx <- df_clean$daily_anxiety_frequencygrp <- df_clean$sleep_group# Calculate observed difference in mean anxiety (Early - Late)obs_diff <-mean(anx[grp =="Early"]) -mean(anx[grp =="Late"])# Permutation testn_sim <-10000perm_diffs <-replicate(n_sim, { shuffled <-sample(grp)mean(anx[shuffled =="Early"]) -mean(anx[shuffled =="Late"])})# Calculate p-value (two-sided)p_val <-mean(abs(perm_diffs) >=abs(obs_diff))# Plot the null distribution with observed diffperm_plot <-tibble(perm_diffs) %>%ggplot(aes(x = perm_diffs)) +geom_histogram(bins =30, fill ="skyblue", color ="black") +geom_vline(xintercept = obs_diff, color ="red", linetype ="dashed", linewidth =1.2) +annotate("text",x = obs_diff +0.05,y =max(table(cut(perm_diffs, 30))) *0.9,label =paste0("Observed diff = ", round(obs_diff, 3)),color ="red",hjust =0,size =4) +labs(title ="<Permutation Test: Difference in Mean Anxiety>",subtitle =paste0("p-value = ", round(p_val, 3)),x ="Mean difference (Early - Late sleepers)",y ="Frequency" ) +theme_minimal()# print the finalized plot perm_plot
<Figure 4. Null distribution of mean anxiety differences between early and late sleepers, generated via 10,000 permutations. The red dashed line represents the observed difference in means (-0.194). Since the observed value lies well within the null distribution, the result is not statistically significant (p = 0.458).>
1. Hypothesis
Null hypothesis (\(H_0\)): There is no difference in mean daily anxiety levels between early and late sleepers.
Alternative hypothesis (\(H_1\)):
There is a difference in mean daily anxiety levels between early and late sleepers.
2. Assumptions
Observations of this are independent of each other.
Anxiety scores are consistant among all the respondents those took survey.
3. Test Static
Observed Difference(Early~Late): -0.194
This indicates that on average, early sleepers reported slightly lower anxiety(by 0.194 points) but this is very small.
4. P-Value
P-Value = 0.458
This P-value conveys that there is 45.8% chance of observing difference in mean anxiety as extreme as -0.194 just by random chance, if null hypothesis is true. Since it is greater than 0.05(significance level), the result is not statistically significant.
5. Decision
We failed to reject the null hypothesis.
6. Conclusion
In conclusion, there is no strong statistical evidence that early sleepers and late sleepers experience different anxiety levels. While the observed difference suggests a slightly lower mean anxiety score for early sleepers, such difference is small and well within what we would expect by the chance alone. Hence, we can conclude that sleep time alone may not be a key factor in determining daily anxiety levels among students in this survey sample.
3 Final Conclusion
This report examined how daily anxiety levels relate to study habits, gender, and sleep patterns among DATA2X02 students at the University of Sydney. While the dataset was subject to selection and measurement biases due to it’s nature of self-reported, it still offered meaningful insights into student wellbeing.
This analysis produced three meaningful insights:
There was no significant association between gender and reported anxiety levels, suggesting that gender identity alone may not be a strong indicator of daily anxiety level.
Students who studied more than 20 hours per week reported significantly lower anxiety levels than those that studied less.
There was no significant difference in anxiety between early and late sleepers as through permutation resampling approach. While early sleepers showed slightly lower average anxiety score, the difference was not meaningfully large. This suggests that the bedtime alone is unlikely to explain difference in mental well-being(anxiety) within this survey sample.
In sum, these results suggests that academic workload may be closely related with student anxiety rather then demographic or behavioural lifestyle factors like gender or bedtime of university students. However, due to exploratory nature of analysis and limitations of datasets including small subgroup sizes, and potential reporting inaccuracies, caution must be taken when generalising these findings from this study.
Further studies could be built on these results by using more representative and larger samples, refining question phrasing to reduce measurement bias. In conclusion, understanding the behavioural patterns associated with anxiety is a crucial step toward to design effective student mental health support or aid.
4 References
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.2) [Computer software]. https://github.com/quarto-dev/quarto-cli
Beaudry, J., Kothe, E., Singleton Thorn, F., McGuire, R., Tierney, N., & Ling, M. (2024). gendercoder: Recodes Sex/Gender Descriptions into a Standard Set (R package version 0.1.0). https://github.com/ropensci/gendercoder
Firke, S. (2023). janitor: Simple Tools for Examining and Cleaning Dirty Data (R package version 2.2.0). https://CRAN.R-project.org/package=janitor
Kassambara, A. (2023). ggpubr: ‘ggplot2’ Based Publication Ready Plots. https://rpkgs.datanovia.com/ggpubr/
Mello, F. (2018, April 25). A push for mental health care at colleges: Depression and anxiety “really eat up our kids.” CalMatters. https://calmatters.org/education/2018/04/a-push-for-mental-health-care-at-colleges-depression-and-anxiety-really-eat-up-our-kids/
Müller, K., & Bryan, J. (2020). here: A Simpler Way to Find Your Files (R package version 0.1.0). https://here.r-lib.org/
Pedersen, T. L. (2022). patchwork: The Composer of Plots (R package version 1.1.1). https://CRAN.R-project.org/package=patchwork
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Tierney, N. (2017). visdat: Visualising whole data frames. Journal of Open Source Software, 2(16), 355. https://doi.org/10.21105/joss.00355
Wickham, H., Averick, M., Bryan, J., Chang, W., D’Agostino McGowan, L., François, R., Grolemund, G., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Xie, Y., Allaire, J. J., & Grolemund, G. (2018). R Markdown: The Definitive Guide. Chapman and Hall/CRC. https://bookdown.org/yihui/rmarkdown
DATA2x02 students. (2025). DATA2x02 class survey. https://canvas.sydney.edu.au/courses/67975/files/45130664?wrap=1
Source Code
---title: "Correlation of Anxiety and Study Hours, Gender, and Sleep Patterns of DATA2002 Students"date: "`r Sys.Date()`"author: "540611658"format: html: embed-resources: true code-fold: true math: mathjax code-tools: truetable-of-contents: true number-sections: true bibliography: report (1).bib---## Introduction### Addressing QuestionsThis report investigates how student lifestyle factors — including study habits, sleep patterns, and gender — relate to self-reported daily anxiety levels. As concerns about student mental health and academic pressure continue to grow, identifying behavioral correlates of anxiety is crucial for developing effective student support strategies within universities. This analysis seeks to explore meaningful patterns within student survey responses that may guide future academic and well-being interventions.The dataset was collected via a voluntary and anonymous online survey distributed to students enrolled in the **DATA2X02** unit at the University of Sydney. Although the sample might initially appear random due to the open participation format, it is in fact *non-random*, and several types of bias must be considered.First, **selection bias** is likely present, as students who chose to participate may be more engaged, self-reflective, or motivated — characteristics that are not necessarily representative of the broader student population. As a result, variables such as *self-reported anxiety*, *study hours*, and *approaches to assignment management* may be systematically skewed.Second, **non-response bias** must also be acknowledged. Students who chose not to complete the survey could differ in significant ways from respondents, especially in terms of *sleep consistency*, *WAM*, or *assignment submission patterns*, leading to distorted estimates of key outcomes.Third, the dataset is subject to **measurement bias**, as many questions rely on self-reported data, which are vulnerable to *social desirability effects* and *recall inaccuracies*. This is particularly relevant for sensitive or subjective variables such as *alcohol consumption*, *sleep hours*, and *daily anxiety levels*.To improve data quality in future iterations, certain survey items could be redesigned. For example, the question about daily anxiety (**Q15**) currently uses a simple 1–10 scale, but lacks defined anchors or examples, which may cause participants to interpret the scale inconsistently. Adding clearly labelled scale intervals (e.g., “1 = not at all anxious,” “10 = extremely anxious multiple times per day”) would reduce ambiguity.Despite these limitations, the dataset remains suitable for exploratory statistical analysis. This report addresses three focused research questions using techniques from **Modules 1 and 2**, including *parametric and non-parametric hypothesis testing*, as well as *resampling methods*. Specifically, we examine how *academic workload*, *gender*, and *sleep consistency* are associated with daily self-reported anxiety.### Data Wrangling and Cleaning```{r}#| message: false#| warning: falselibrary(tidyverse)library(janitor)library(hms)library(ggplot2)library(ggpubr)library(lubridate)theme_set(theme_bw())x <- readxl::read_excel("DATA2x02_survey_2025_Responses.xlsx")old_names =colnames(x)new_names <-c("timestamp","target_grade","assignment_preference","trimester_or_semester","age","tendency_yes_or_no","pay_rent","stall_choice","weetbix_count","weekly_food_spend","living_arrangements","weekly_alcohol","believe_in_aliens","height","commute","daily_anxiety_frequency","weekly_study_hours","work_status","social_media","gender","average_daily_sleep","usual_bedtime","sleep_schedule","sibling_count","allergy_count","diet_style","random_number","favourite_number","favourite_letter","drivers_license","relationship_status","daily_short_video_time","computer_os","steak_preference","dominant_hand","enrolled_unit","weekly_exercise_hours","weekly_paid_work_hours","assignments_on_time","used_r_before","team_role_type","university_year","favourite_anime","fluent_languages","readable_languages","country_of_birth","wam","shoe_size","books_read_highschool","daily_water_intake_l","perceived_old_age","study_music_preference")# overwrite the old names with the new names:colnames(x) = new_namesdf_clean <- x %>%select(daily_anxiety_frequency =16,weekly_study_hours =17,gender =20,usual_bedtime =22 ) %>%mutate(across(c(daily_anxiety_frequency, weekly_study_hours),~as.numeric(.) ),gender =str_to_lower(gender),gender =case_when( gender %in%c("female", "f", "girl", "girl ", "woman", "lady") ~"Female", gender %in%c("male", "m", "man", "bloke", "guy") ~"Male", gender %in%c("", "blank", NA) ~NA_character_,TRUE~"Other" ),gender =factor(gender),usual_bedtime =as_hms(usual_bedtime),sleep_hour =hour(usual_bedtime) ) %>%drop_na(gender, daily_anxiety_frequency, weekly_study_hours, sleep_hour) %>%mutate(anxiety_group =case_when( daily_anxiety_frequency <=3~"Low", daily_anxiety_frequency <=7~"Mid",TRUE~"High" ),study_group =if_else(weekly_study_hours >20, "> 20 hrs", "≤ 20 hrs") )#table(df_clean$gender)#glimpse(df_clean)```The calculations utilized in report were performed in R using the Rmarkdown environment to ensure reproducibility. Data Cleaning was carried out using various tidyverse packages. Column names were cleaned and shortened for clarity. Variables relevant to research was retained, including self-reported anxiety frequency, study hours, gender, and usual bedtime. Numeric variables were coerced into appropriate formats. Gender values were standardised for consistency into two groups, female and male since theand bedtime were converted to numeric hour values for further analysis.## Results### Does the distribution of anxiety levels differ by gender? (group the anxiety into groups: low, mid, high)---The bar chart in **Figure 2** suggests that distribution of **anxiety levels(low,mid,high)** may differ across **gender groups**. To formally assess this, we performed chi-squared test of homogeneity to test whether the proportions of anxiety groups are same across different gender categories. Note that conducting a test of homogeneity, we assume that the responses were sampled independently from the student population that identify with each gender.```{r}df_clean %>%select(gender, anxiety_group) %>%filter(gender !="Other") %>%drop_na() %>%ggplot(aes(x = gender)) +geom_bar(aes(fill = anxiety_group),position ="dodge",linewidth =1 ) +scale_fill_manual(values =c("Low"="#9ecae1", "Mid"="#fdd0a2", "High"="#fc9272")) +labs(title ="Distribution of Anxiety Levels by Gender",x ="Gender",y ="Count",fill ="Anxiety Group" ) +theme_minimal(base_size =13)```**\<Figure 2. Distribution of anxiety levels(Low, Mid, High) across gender groups\>**```{r}#| message: false#| warning: falsetbl <- df_clean %>%filter(!is.na(gender), !is.na(anxiety_group)) %>%filter(gender !="Other") %>%count(gender, anxiety_group) %>%pivot_wider(names_from = anxiety_group, values_from = n, values_fill =0) %>%column_to_rownames("gender") %>%as.matrix()chisq.test(tbl)tbl```--- **1. Hypothesis**- **Null hypothesis ($H_0$):** The proportions of anxiety levels(Low,Mid,High) are the same across gender groups(Female,Male, and Other).- **Alternative hypothesis ($H_1$):** The proportions of anxiety levels differ for at least one gender group.---**2. Assumptions**- The observations are independent(each student will contribute to only one gender group)- Data are organized in contingency table and are categorical ---**3. Test Static**- **Test type:** Chi-squared test of homogeneity- **Test statistic (t):** 2.9237- **Degrees of freedom (df):** 2---**4. P-Value**- Chi-squared approximation **p-value:** 0.2318---**5. Decision**- Since the p-value is 0.2318, which is greater than 0.05(significance level), we fail to reject the null hypothesis. There is insufficient evidence to suggest that there are difference in anxiety level across gender groups. ---**6. Conclusion**- This result with Chi-squared test of homogeineity suggest that there are **no statistically significant correlation between gender groups and daily anxiety levels** in this sample. However, the presence of small counts in some cells(e.g: Other group) may limit the reliability of the test conducted, and larger sample may be needed to draw more firmer and rigor conclusions. ---### Do students who study more than 20 hours/week report lower anxiety levels than those who study less?```{r}library(ggplot2)library(ggpubr)library(patchwork)# A. Boxplot + jitter (study group vs anxiety)p1 <-ggplot(df_clean, aes(x = study_group, y = daily_anxiety_frequency, fill = study_group)) +geom_boxplot(outlier.shape =NA, alpha =0.5) +geom_jitter(width =0.2, alpha =0.3) +labs(x ="Study Group",y ="Daily Anxiety Frequency" ) +theme_minimal()# B. QQ Plot p2 <-ggqqplot(df_clean, x ="daily_anxiety_frequency", facet.by ="study_group") +theme_minimal()# Combine themp1 + p2 +plot_annotation(tag_levels ='A')```**\<Figure 3. Visualisation of anxiety levels and normality across study hour groups\>**A. Boxplots with jitter comparing daily anxiety frequency between students who study more than 20 hours and those who study 20 hours or less.B. QQ plots assessing the normality of anxiety scores within each group. The data approximately follows the diagonal line, supporting the assumptions of Welch’s t-test.```{r}summary_tbl <- df_clean %>%group_by(study_group) %>%summarise(n =n(),Mean =mean(daily_anxiety_frequency, na.rm =TRUE),SD =sd(daily_anxiety_frequency, na.rm =TRUE) )knitr::kable(summary_tbl, caption ="Table 1: Anxiety levels by Study Group")``````{r}t.test(daily_anxiety_frequency ~ study_group, data = df_clean, var.equal =TRUE)``` **1. Hypothesis**- **Null hypothesis ($H_0$):** There is no difference in anxiety levels between students who study more than 20 hours and those who study less.- **Alternative hypothesis ($H_1$):** Students who study more than 20 hours per week report lower anxiety levels on average.---**2. Assumptions**- Observations are independent - Anxiety scores are approximately normally distributed within each study group- Variances are equal(boxplot)---**3. Test Static**- **Test type:** Two sample t-test- **Test statistic (t):** 2.0753- **Degrees of freedom (df):** 341---**4. P-Value**- **P-Value =**0.03871---**5. Decision**- Since the p-value(0.03871) is greater than 0.05, we **accept the null hypothesis**. This provides evidence that there are significant difference in anxiety_level between two study groups. ---**6. Conclusion**- Students who study more than 20 hours per week report significantly lower daily anxiety levels than those who studies less. While this association is statistically significant, the direction of casuality remains unclear and may be influenced by other unmeasured factors in student life. ---### Do early sleepers report lower anxiety levels than late sleepers?```{r}df_clean <- df_clean %>%mutate(sleep_group =if_else(sleep_hour <2| sleep_hour >=22, "Early", "Late") )set.seed(2025)# Make a working copy of anxiety and group labelsanx <- df_clean$daily_anxiety_frequencygrp <- df_clean$sleep_group# Observed difference in mean anxietyobs_diff <-mean(anx[grp =="Early"]) -mean(anx[grp =="Late"])# Fast permutation testn_sim <-10000perm_diffs <-replicate(n_sim, { shuffled_grp <-sample(grp)mean(anx[shuffled_grp =="Early"]) -mean(anx[shuffled_grp =="Late"])})# Two-sided p-valuep_val <-mean(abs(perm_diffs) >=abs(obs_diff))p_val``````{r}library(tidyverse)set.seed(2025)# Define variablesanx <- df_clean$daily_anxiety_frequencygrp <- df_clean$sleep_group# Calculate observed difference in mean anxiety (Early - Late)obs_diff <-mean(anx[grp =="Early"]) -mean(anx[grp =="Late"])# Permutation testn_sim <-10000perm_diffs <-replicate(n_sim, { shuffled <-sample(grp)mean(anx[shuffled =="Early"]) -mean(anx[shuffled =="Late"])})# Calculate p-value (two-sided)p_val <-mean(abs(perm_diffs) >=abs(obs_diff))# Plot the null distribution with observed diffperm_plot <-tibble(perm_diffs) %>%ggplot(aes(x = perm_diffs)) +geom_histogram(bins =30, fill ="skyblue", color ="black") +geom_vline(xintercept = obs_diff, color ="red", linetype ="dashed", linewidth =1.2) +annotate("text",x = obs_diff +0.05,y =max(table(cut(perm_diffs, 30))) *0.9,label =paste0("Observed diff = ", round(obs_diff, 3)),color ="red",hjust =0,size =4) +labs(title ="<Permutation Test: Difference in Mean Anxiety>",subtitle =paste0("p-value = ", round(p_val, 3)),x ="Mean difference (Early - Late sleepers)",y ="Frequency" ) +theme_minimal()# print the finalized plot perm_plot```**<Figure 4. Null distribution of mean anxiety differences between early and late sleepers, generated via 10,000 permutations. The red dashed line represents the observed difference in means (-0.194). Since the observed value lies well within the null distribution, the result is not statistically significant (p = 0.458).>** **1. Hypothesis**- **Null hypothesis ($H_0$):**There is no difference in mean daily anxiety levels between early and late sleepers. - **Alternative hypothesis ($H_1$):** There is a difference in mean daily anxiety levels between early and late sleepers. ---**2. Assumptions**- Observations of this are independent of each other. - Anxiety scores are consistant among all the respondents those took survey. ---**3. Test Static**- **Observed Difference(Early~Late):** -0.194 This indicates that on average, early sleepers reported slightly lower anxiety(by 0.194 points) but this is very small. ---**4. P-Value**- **P-Value =** 0.458 This P-value conveys that there is 45.8% chance of observing difference in mean anxiety as extreme as -0.194 just by random chance, if null hypothesis is true. Since it is greater than 0.05(significance level), the result is not statistically significant. ---**5. Decision**- We failed to reject the null hypothesis. ---**6. Conclusion**- In conclusion, there is no strong statistical evidence that early sleepers and late sleepers experience different anxiety levels. **While the observed difference suggests a slightly lower mean anxiety score for early sleepers**, such difference is small and well within what we would expect by the chance alone. Hence, we can conclude that sleep time alone may not be a key factor in determining daily anxiety levels among students in this survey sample. ---## Final ConclusionThis report examined how daily anxiety levels relate to study habits, gender, and sleep patterns among DATA2X02 students at the University of Sydney. While the dataset was subject to selection and measurement biases due to it's nature of self-reported, it still offered meaningful insights into student wellbeing. **This analysis produced three meaningful insights:** - 1. There was no significant association between gender and reported anxiety levels, suggesting that gender identity alone may not be a strong indicator of daily anxiety level. - 2. Students who studied more than 20 hours per week reported significantly lower anxiety levels than those that studied less.- 3. There was no significant difference in anxiety between early and late sleepers as through permutation resampling approach. **While early sleepers showed slightly lower average anxiety score**, the difference was not meaningfully large. This suggests that the bedtime alone is unlikely to explain difference in mental well-being(anxiety) within this survey sample. In sum, these results suggests that academic workload may be closely related with student anxiety rather then demographic or behavioural lifestyle factors like gender or bedtime of university students. However, due to exploratory nature of analysis and limitations of datasets including small subgroup sizes, and potential reporting inaccuracies, caution must be taken when generalising these findings from this study. Further studies could be built on these results by using more representative and larger samples, refining question phrasing to reduce measurement bias. In conclusion, understanding the behavioural patterns associated with anxiety is a crucial step toward to design effective student mental health support or aid. ## ReferencesAllaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). *Quarto* (Version 1.2) [Computer software]. https://github.com/quarto-dev/quarto-cliBeaudry, J., Kothe, E., Singleton Thorn, F., McGuire, R., Tierney, N., & Ling, M. (2024). *gendercoder: Recodes Sex/Gender Descriptions into a Standard Set* (R package version 0.1.0). https://github.com/ropensci/gendercoderFirke, S. (2023). *janitor: Simple Tools for Examining and Cleaning Dirty Data* (R package version 2.2.0). https://CRAN.R-project.org/package=janitorKassambara, A. (2023). *ggpubr: 'ggplot2' Based Publication Ready Plots*. https://rpkgs.datanovia.com/ggpubr/Mello, F. (2018, April 25). A push for mental health care at colleges: Depression and anxiety “really eat up our kids.” CalMatters. https://calmatters.org/education/2018/04/a-push-for-mental-health-care-at-colleges-depression-and-anxiety-really-eat-up-our-kids/Müller, K., & Bryan, J. (2020). *here: A Simpler Way to Find Your Files* (R package version 0.1.0). https://here.r-lib.org/Pedersen, T. L. (2022). *patchwork: The Composer of Plots* (R package version 1.1.1). https://CRAN.R-project.org/package=patchworkR Core Team. (2024). *R: A language and environment for statistical computing*. R Foundation for Statistical Computing. https://www.R-project.org/Tierney, N. (2017). visdat: Visualising whole data frames. *Journal of Open Source Software, 2*(16), 355. https://doi.org/10.21105/joss.00355Wickham, H., Averick, M., Bryan, J., Chang, W., D’Agostino McGowan, L., François, R., Grolemund, G., et al. (2019). Welcome to the tidyverse. *Journal of Open Source Software, 4*(43), 1686. https://doi.org/10.21105/joss.01686Xie, Y., Allaire, J. J., & Grolemund, G. (2018). *R Markdown: The Definitive Guide*. Chapman and Hall/CRC. https://bookdown.org/yihui/rmarkdownDATA2x02 students. (2025). *DATA2x02 class survey*. https://canvas.sydney.edu.au/courses/67975/files/45130664?wrap=1